Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Naming conventions for syntax scopes #564

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Naming conventions for syntax scopes #564

wants to merge 5 commits into from

Conversation

chbk
Copy link

@chbk chbk commented Sep 21, 2019

Motivations for this PR

The lack of documentation on syntax scopes has been an issue for many years and still hinders the progress of contributors who wish to create themes and grammar packages for Atom. An attempt at fixing this has been made in 2015 but hasn't evolved since.

To resolve this ongoing issue, I propose this documentation based on Textmate scopes. Most of the conventional scopes have been preserved for backwards-compatibility with existing themes. However, I have also introduced new scopes and removed redundant ones. This documentation aims to be:

  • Explicit. The more language-agnostic scopes we provide, the better we prevent scope conflicts between language grammars and the easier it becomes to create universal themes.
  • Consistent. Having the same elements scoped the same way throughout a document promotes coherent coloring and makes code intelligible.

Changes from the Textmate Scopes

(click items to expand)

Additions

keyword.storage.declaration

The storage.type scope is too general. The lack of specificity results in this scope being misused, e.g. let scoped as storage.type in JavaScript, or just blatantly ignored, e.g. func scoped as keyword.function in Go or enum scoped as keyword.control in C. More specific scopes to differentiate variable type keywords from declaration keywords are needed. Keywords that declare an entity (union, struct, function, def, var, mod, etc.) are now scoped keyword.storage.declaration. Keywords that specify a variable's type are now scoped keyword.type.

entity.type

Type highlighting must stay consistent across a document. Any identifier that can be used as a type must be scoped as such.

entity.type.support

This scope targets built-in types that aren't keywords. It also includes built-in class types such as String, List, Exception.

entity.type.fundamental(.support)

More precise than entity.type.support, this scope targets fundamental primitive and compound types (int, bool, list, map, etc.) that aren't keywords.

entity.operator

To stay coherent with the keyword scope, this scope is used in languages that support non-keyword operators.

keyword.type, keyword.variable, keyword.function

The four main entity scopes (variable, function, operator, type) are given a keyword equivalent for reserved tokens.

symbolic

For themes that wish to target non-alphabetic tokens and highlight them the same way regardless of their more specific scopes.

punctuation

Punctuation is part of language grammars and must not be neglected.

text

Unformatted text in HTML and Markdown requires a scope as well. markup is specifically used for stylized text, a text scope is needed to avoid leaving any tokens unscoped.

string.part

To give more general control over string highlighting, this scope encompasses parts of strings such as interpolations, placeholders, format specifiers etc.

comment.part

Parts of comments worthy of highlighting such as captions and documented terms are also given a general scope.

More specific scopes

Every language has their own vocabulary, and that creates conflicts between scopes. Let's consider the module scope as an example. Depending on the language, a module can be a namespace (Rust), an imported package (Python), a mixin class (Ruby)... The module scope becomes meaningless for syntax themes without the language context. We can resolve this by providing pre-defined scopes that call similar elements by the same name, regardless of the language. In this case, these scopes would be namespace for a collection of entities, package for an imported element, and class.mixin for a mixin class.

Adjustments

entity

The original definition from the Textmate docs reads "An entity refers to a larger part of the document [...]. We do not scope the entire entity as entity.* (we use meta.* for that). But we do use entity.* for the placeholders in the larger entity [...]". This leads to entity being solely used to scope identifiers and re-used whenever the identifiers re-appear in the document. Hence, the practical definition for entity is simply "an identifier".

storage

This scope encompasses "Things relating to “storage”" according to the original Textmate documentation. This catch-all definition is useless for themes that want to differentiate declaration keywords, keyword types, support types, and other defined types. To resolve this ambiguity, this scope is now organized into different categories: keyword.storage for storage keywords, keyword.type for non-redefinable keyword types, entity.type for non-keyword types, entity.type.support for built-in non-keyword types, entity.type.fundamental.support for primitive and composite non-keyword types.

support

Using this scope must not exempt one from also using other relevant scopes. Therefore support is no longer its own standalone category but is appendable to other scopes when appropriate. Furthermore, the original definition for this scope was "things provided by a framework or library". But in practice it is mostly used for built-in entities rather than imported ones. This justifies changing the definition to "a built-in or imported or conventional token that can usually be redefined".

invalid

An invalid token can either be a deprecated token or an illegal token. Similarly to support, it must be used with other relevant scopes when applicable. Often these scopes would be entity or keyword or punctuation.

variable

A variable is an entity. Similarly to support it must not be used as a standalone scope. It is now nested under the entity or keyword scopes and can be used along with more specific scopes such as parameter, argument, support, member, etc. which gives more control to themes to fine-tune their highlighting.

Deletions

other

This scope is used too liberally and contributes no information.

name in entity.name

Adding name to entity is redundant as the latter is exclusively used to scope named elements.

java, python, etc.

Trailing scopes identifying the language are redundant as this information is provided by the root scope, e.g. source.java.

Amendments to the Documentation

I propose adding the following section to the Hacking Atom chapter of the Atom flight manual. I use "class" instead of "scope" below to conform to the terminology used in existing documentation.

Syntax Naming Conventions

Naming conventions provide predefined names for similar tokens in different languages. These language-agnostic names, called syntax classes, help themes highlight code without relying on language-specific vocabulary.

When creating a language grammar, use these conventions to determine which syntax classes correspond to your syntax nodes. When creating a syntax theme, use these conventions to determine which syntax classes correspond to your theme colors.

Guidelines for Language Grammars

The syntax classes are organized into nested categories. In a language grammar, multiple nested classes can be applied to a syntax node by appending them with periods, as in entity.function.call. The order in which classes are appended does not affect the resulting highlighting. However, we recommend following their hierarchy to stay consistent.

Root classes, like ▶ entity, must not be appended to other root classes, unless explicitly allowed in this documentation. Main classes indicated in bold, like function, must be used for coherent highlighting. Shared classes indicated in brackets, like [call], can be appended when relevant. You may create additional classes if they clarify your grammar and do not introduce conflicts with existing classes.

Guidelines for Syntax Themes

In a syntax theme, styling can be applied to a syntax class with the syntax-- prefix, as in syntax--entity. When targeting a nested class, specify its parent classes by prepending them with periods, as in syntax--entity.syntax--function.syntax--call. A typical styling rule would look like this:

.syntax--entity.syntax--function.syntax--call {
  color: blue;
}

List of Syntax Classes

(click items to expand)

keyword
  • keyword — A keyword.
    • [symbolic] — A keyword with no alphabetic characters.
    • control — A control or structure keyword.
      • condition — A condition keyword.
        Examples: if, else, elif.
      • loop — A loop keyword.
        Examples: for, for...in, for...of, do, while.
      • exception — An exception keyword.
        Examples: try, catch, finally.
      • jump — A keyword used to jump to/from a statement.
        Examples: break, continue, pass, return, yield, throw, await, defer.
      • package — A keyword for imports or exports.
        Examples: import, from, include, export, require.
      • directive — An instruction given to the compiler.
        Examples: #include, #define, #ifdef, using, package, use strict.
      • evaluate — A keyword used to evaluate an expression.
        Examples: assert, with...as.
    • storage — A storage keyword.
      • modifier — A keyword to detail the behavior of an entity.
        Examples: static, abstract, final, throws, get, extends.
      • declaration — A keyword to declare an entity.
        Examples: let, const, func, def, end, class, enum, typedef, namespace.
    • type — A type keyword.
      Examples: char, int, bool, void.
      • wildcard — A wildcard keyword for an unknown type.
        Example: ? in List<?> list.
    • operator — An operator keyword. Includes overloaded operators.
      • logical — A logical operator.
        Examples: and, not, or, !, &&, ||.
      • ternary — A ternary condition operator.
        Examples: ?, :.
      • assignment (compound) — An assignment operator.
        Examples: =, :=, +=, -=, *=, %=.
      • comparison — A comparison operator.
        Examples: ==, <, >, !=, in, instanceof.
      • arithmetic — An arithmetic operator.
        Examples; +, -, /, *, @, ++, --.
      • pointer (reference) (dereference) — A pointer operator.
        Examples: &, *.
      • bitwise — A bitwise operator.
        Examples: <<, >>, |, &, ^, ~.
      • instance — A instance operator.
        Examples: del, delete, new, typeof.
      • composition — A composition operator (Haskell).
        Example: ..
      • combinator — A combinator operator (CSS).
        Examples: >, +, ~, &.
    • function — A function keyword.
      Example: super.
    • variable — A variable keyword.
      Examples: this, self, @.
entity
  • entity — An identifier.
    • [parameter] — A parameter in a definition or declaration or call.
      Examples: myFunction(parameter = argument), class MyClass<parameter> {}.
    • [argument] — An argument in a call.
      Examples: instance.method(argument), new MyClass(argument).
    • [definition] — An entity that is being defined or declared.
      Examples: my_variable in let my_variable, myFunction in def myFunction().
    • [call] — An entity that is being called.
      Examples: myFunction in myFunction(argument), MyClass in new MyClass(argument).
    • [mutable] — An entity whose properties or value can be changed.
      Examples: var mutable, let mutable.
    • [immutable] — An entity whose properties or value cannot be changed.
      Examples: const immutable, final immutable.
    • [support] — A built-in or imported or conventional entity that can usually be redefined.
      Examples: self, cls, arguments, iota, len, print, loop, int, list, bool.
    • variable — A variable.
      • member — A member variable in an object.
        Examples: {property: value}, object.attribute.
    • function — A function.
      • cast — A type casting function that is not a type itself.
        Examples: as.matrix(), toInt().
      • method (constructor)— A method in an object.
        Examples: {method: (parameter) => value}, object.method().
      • lambda — A lambda.
        Example: lambda = ->() {}.
      • infix — A function used in infix notation.
        Example: 1 `function` 2.
    • operator — An operator.
      • [symbolic] — An operator with no alphabetic characters.
        Examples: %>%, <+>.
    • type — A type.
      • [cast] — A type used for type casting, eventually in functional notation.
        Examples: float(32), int(3.2), matrix().
      • [constructor] — A type used as an instance constructor, eventually in functional notation.
        Examples: new MyClass().
      • fundamental — A fundamental primitive or composite type.
        Examples: char, int, bool, rune, list, map, tuple.
      • class — A class.
        Examples: MyClass, String, List.
        • inherited — An inherited class.
          Example: class Child < Inherited.
        • mixin — A mixin class.
          Example: module Mixin (Ruby).
        • generic — A generic class.
          Examples: <T>, <E>.
        • exception — An exception.
          Example: AssertionError.
        • abstract — An abstract class.
          Example: abstract class Abstract (Java)
      • interface — An interface.
        Example: Vehicle in public interface Vehicle {}.
      • enumeration — An enumeration.
        Example: Color in enum Color{red, green, blue}.
      • structure — A structure.
        Examples: Address in type Address struct {}.
      • union — An union.
        Example: IPv4 in union IPv4 {}.
      • alias — An alias.
        Example: Number in typedef int Number.
    • annotation — An annotation.
      Examples: @Override (Java), #[test] (Rust), [Obsolete] (C#).
    • namespace — A namespace.
      Examples: namespace Namespace {} (C++), namespace::function() (Rust).
    • package — A package.
      Example: from package import entity.
    • label — A statement label.
      Example: goto label.
    • lifetime — A lifetime (Rust).
      Example: 'static.
    • tag — A tag (HTML).
      Examples: body, div, input.
    • attribute (id) (class) — An attribute (HTML).
      Example: <tag attribute=value>.
    • property — A property (CSS).
      Example: {property: value}.
    • selector (tag) (id) (class) (pseudo-...) (attribute) — A selector (CSS).
      Examples: #id, .class, :hover, :not, ::before, ::after, [attribute].
string
  • string — A string or part of a string.
    • [argument] — An argument in a call.
      Examples: myFunction("string"), new MyClass('string').
    • [mutable] — A mutable string. Specified when mutable and immutable coexist.
      Example: 'string' (Ruby).
    • [immutable] — An immutable string. Specified when mutable and immutable coexist.
      Example: :immutable (Ruby).
    • [key] — A key in a key-value pair.
      Example: {"key" => value}.
    • [quoted] — A quoted string.
      Examples: "string", 'string', $"template string", /^regexp string$/.
    • [unquoted] — An unquoted string.
      Example: 'key': unquoted.
    • [part] — A part of a string.
      • interpolation — An interpolation.
        Examples: ${variable}, {variable:<30}.
      • placeholder — A placeholder.
        Examples: %()d, {0:.2f}, %-#10x, \1.
      • format — A format specifier.
        Examples: <30, d, .2f, -#10x.
    • regexp — A regular expression.
      Example: /^regexp$/.
      • [part] — A part of a regular expression.
        • language — A regular expression keyword.
          • [symbolic] — A keyword with no alphabetic characters.
          • control (anchor) (reference) (mode) — A control token.
            Examples: ^, $, \b, \k, \1, i in (?i), g in /^regexp$/g.
          • operator (quantifier) — A quantifier operator.
            Examples; ?, *, +, {1,2}.
        • variable — A regular expression variable.
          Examples: (?<variable>), \k<variable>.
        • group — A regular expression group.
          Examples: (capture), (?:non-capture).
        • lookaround — A regular expression lookaround.
          Example: (?=lookahead).
        • set — A regular expression set.
          Example: [^A-Z].
    • template — A template string.
      Examples: $"string {interpolation}", `string ${interpolation}`.
    • heredoc — A here document.
      Example: <<EOF A multiline here document EOF.
constant
  • constant — A literal other than a string.
    • [argument] — An argument in a call.
      Examples: myFunction(constant), float(constant).
    • [key] — A key in a key-value pair.
      Example: {key: value}.
    • [quoted] — A quoted constant.
      Example: 'a'.
    • [unquoted] — An unquoted constant.
      Example: #color.
    • [support] — A built-in or imported or conventional constant.
      Examples: absolute, blue, screen.
    • [language] — A literal keyword.
      • [symbolic] — A keyword with no alphabetic characters.
        Example: ... (Python).
      • boolean — A boolean.
        Examples: true, false.
      • null — A null value.
        Examples: None, null, nil.
      • undefined — An undefined value.
        Example: undefined.
      • numeric — A numeric word.
        Example: Infinity.
    • numeric — A number.
      • integer — An integer.
        Example: 2.
      • decimal — A decimal number.
        Example: .17.
      • hexadecimal — A hexadecimal number.
        Example: 0x29.
      • unit — A length unit (CSS).
        Examples: %, px, pt, em.
      • duration — A duration (Lilypond).
        Examples: 8, 2..
    • character — A character.
      Example: 'a'.
      • [escape] — An escape sequence.
        Examples: \", \\, \i, \?, \u2661, \n, \d, \W.
      • code — A substitute for another character.
        Examples: &lt;, \x2f, \n.
        • shorthand — A shorthand for other characters (RegExp).
          Examples: ., \d, \W, \s.
        • range — A range of characters (RegExp).
          Examples: a-z, 0-9.
        • whitespace — A whitespace character.
          Examples: \t, \f.
          • newline — A newline character.
            Examples: \n, \r.
        • unicode — A unicode code point.
          Example: \u2661.
        • hexadecimal — A hexadecimal code point.
          Example: \x2f.
        • octal — An octal code point.
          Example: \143.
    • color — A color (CSS).
      Examples: crimson, #a35.
      • prefix — A color prefix.
        Example: #.
    • font — A font (CSS).
      Examples: Helvetica, Times New Roman.
    • style — A style (CSS).
      Examples: break-word, solid, absolute.
    • note — A note (Lilypond).
      Examples: c, d', a,,.
      • rest — A rest (Lilypond).
        Example: r.
text
  • text — Plain text (HTML, Markdown).
markup
  • markup — Stylized text (Markdown).
    • heading — A heading.
      Example: # Heading.
    • list — A list item.
      Examples: 1. item, - item.
    • quote — A quote.
      Example: > quote.
    • bold — Bold text.
      Example: **bold**.
    • italic — Italic text.
      Example: *italic*.
    • underline — Underlined text.
      Example: __underline__.
    • strike — Striked-through text.
      Example: ~~strike~~.
    • raw — Raw unformatted text or code.
      Example: `raw`.
    • link — An url or path or reference.
      Examples: url.com, (path) in [alt](path), [reference].
    • alt — Alternate text for a link.
      Examples: [alt], ![alt].
    • critic — A critic.
      • inserted — An insertion.
        Example: {++ inserted ++}.
      • deleted — A deletion.
        Example: {-- deleted --}.
      • changed — A modification.
        Example: {~~ from ~> to ~~}.
      • commented — A comment.
        Example: {>> commented <<}.
      • highlighted — A highlight.
        Example: {== highlighted ==}.
comment
  • comment — A comment or part of a comment. Includes comments in strings.
    Examples: (?# comment), <!-- comment -->, /* comment */.
    • [part] — A part of a comment.
      • caption — A caption in a comment.
        Examples: @param, <returns>, NOTE, TODO, FIXME.
      • path — A path in a comment.
        Example: path/to/my-file.
      • term (variable) (function) (operator) (type) — A documented entity.
        Examples: type and variable in @param {type} variable.
    • line — A one-line comment.
      Example: # comment.
    • block — A multi-line comment.
      Example: /* ... */.
punctuation
  • punctuation — A punctuation mark.
    • definition — Punctuation that defines tokens.
      • string — Punctuation for a string.
        Examples: ", ', $".
        • regexp — Punctuation for a regular expression.
          Examples: r", /.
      • constant — Punctuation for a constant.
        • character — Punctuation for a character.
          Example: '.
      • markup — Punctuation for text styling (Markdown).
        Examples: _, *, ~, #, -. 1., [, ].
      • comment — Punctuation for a comment.
        Examples: //, #, <!--, -->.
      • collection — Punctuation for a collection (array, set, map, etc.).
        Examples: [, ], {, }.
      • variable — Punctuation for a variable.
        Example: $.
      • function (generator) — Punctuation for a function.
        Examples: `, *.
      • operator — Punctuation for an operator.
        Examples: (, ).
      • package (wildcard) — Punctuation for a package.
        Examples: ., *.
      • annotation — Punctuation for an annotation.
        Examples: @ (Java), #![] (Rust).
      • decorator — Punctuation for a decorator.
        Example: @ (Python).
      • tag — Punctuation for a tag (HTML).
        Examples: <, />.
      • selector (wildcard) — Punctuation for a selector (CSS).
        Examples: *, ., #, :, ::, [, ].
    • operation — Punctuation to operate on tokens.
      • variadic — Punctuation to operate on variadic arguments.
        Examples: ..., *, **.
      • return — Punctuation to operate on lambda parameters.
        Example: -> in (parameter) -> expression.
    • association — Punctuation to associate values to tokens.
      • pair — Punctuation to associate an expression to a key.
        Examples: : in key: value, => in None => println!("Nothing").
      • iterator — Punctuation to associate an expression to an iterator.
        Example: : in auto& iterator : items.
    • accessor — Punctuation to access contained entities.
      • member — Punctuation for member access.
        Examples: ., ->.
      • scope — Punctuation for scope resolution.
        Example: ::.
    • delimiter — Punctuation to delimit tokens.
      • string — Punctuation to delimit tokens in a string.
        • [part] — Punctuation to delimit a part of a string.
          • interpolation — Punctuation to delimit an interpolation.
            Examples: #{, ${, }.
          • placeholder — Punctuation to delimit a placeholder.
            Examples: {, }, %, %(, ).
          • format — Punctuation to delimit a format specifier.
            Example: :.
        • regexp — Punctuation to delimit tokens in a regular expression.
          • [part] — Punctuation to delimit a part of a regular expression.
            • group — Punctuation to delimit a group.
              Examples: (?:, (, (?P, ).
            • lookaround — Punctuation to delimit a lookaround.
              Examples: (?=, (?!, ).
            • disjunction — Punctuation to delimit a disjunction.
              Example: |.
            • set — Punctuation to delimit a set.
              Examples: [^, [, ].
            • mode — Punctuation to delimit a mode specifier.
              Examples: (?, ).
      • comment — Punctuation to delimit tokens in a comment.
        • [part] — Punctuation to delimit a part of a comment.
          • caption — Punctuation to delimit a caption.
            Examples: <, >, :.
          • term — Punctuation to delimit a documented entity.
            Examples: { and } in {type}.
      • parameters — Punctuation to delimit parameters.
        Examples: (, ).
      • arguments — Punctuation to delimit arguments.
        Examples: (, ).
      • subscript — Punctuation to delimit a subscript.
        Examples: [, ].
      • type — Punctuation to delimit a type or return type.
        Examples: <, :, ->, (, ).
      • body — Punctuation to delimit a body.
        Examples: {, }, :.
      • statement — Punctuation to delimit a statement.
        Examples: {, }, :.
      • expression — Punctuation to delimit an expression.
        Examples: (, ).
      • embedded — Punctuation to delimit embedded code.
        Examples: ~~~, <%=, %>.
      • package — Punctuation to delimit package imports or exports.
        Examples: ( ), {, }.
    • separator — Punctuation to separate similar tokens.
      Examples: ,, \.
    • terminator — Punctuation to terminate a statement.
      Example: ;.
invalid
  • [invalid] — An invalid token. Appendable to every class.
    • deprecated — A deprecated token which should no longer be used.
    • illegal — An illegal token which doesn't belong there.
meta
  • meta — A larger part of the document encompassing multiple tokens.
    • function — A function definition block.
    • class — A class definition block.
    • embedded — A language embedded in another.

Related Pull Requests

The following themes and grammars are ready to be updated with the naming conventions. Screenshots showing the syntax highlighting improvements are attached in each PR.

@chbk
Copy link
Author

chbk commented Oct 23, 2019

2021-02-01: This is now ready for review.

To preview the changes brought by the PRs, follow this procedure:

# Create a directory to host the PRs
mkdir naming-conventions
cd naming-conventions

# Install the PRs
for grammar in language-c language-css language-go language-html language-javascript language-python language-ruby; do
  git clone --depth 1 --branch scopes https://github.com/chbk/$grammar.git
  cd $grammar
  apm install
  cd ..
done

# Add the PRs to Atom
for grammar in language-c language-css language-go language-html language-javascript language-python language-ruby; do
  apm link $grammar
done

To remove the PRs from Atom:

cd naming-conventions
for grammar in language-c language-css language-go language-html language-javascript language-python language-ruby; do
  apm unlink $grammar
done

@adrian5
Copy link
Contributor

adrian5 commented Nov 7, 2019

Awesome man, doing god's work!

As an aside, I'm wondering if we could add a repository that contains complete code snippets of all (possible) programming languages, covering the full range of available syntax. This would be a boon to syntax-theme developers, even if the syntax-scopes were a moving target.

@chbk
Copy link
Author

chbk commented Nov 7, 2019

That would be nice, although covering every programming language would be tedious. I'm actually creating snippets to test the grammars listed above and I intend to include them in upcoming pull requests. If you want to create a central repository for code snippets I'd be glad to contribute mine.

Another valuable resource for theme developers is the syntax theme template, which should be updated once the conventions are stable. Ideally, just a couple tweaks to the template should produce the desired coloring consistently across every language. That means a developer would only need to test his theme in a couple languages to be confident it works the same way in every language.

@jeff-hykin
Copy link

jeff-hykin commented Aug 8, 2020

Many of the challenges with existing Textmate scopes are the problems of gray areas.

For example consider the 'variadic sizeof' in C++ which is sizeof.... It's an operator, but it is not alphanumeric and it's also not symbolic. It's kind of both. Similarly is a lambda a function definition or a variable? It's definitely both.

Atom can theoretically handle these things quite well because of its use of CSS scopes which don't care about ordering. For example having both a variable and a function scope/class will allow either of them to be matched, and allow both to be matched with .variable.function. I've been grappling with this problem for years on VS Code, and I genuinely believe the pure-textmate version (can be significantly improved but) can never solved because of 1. strict ordering, 2. priority given to lowest scope, and 3. Lack of a direct-child operator. I believe fixing any one of those could break the unsolvable nature, and Atom with CSS has fixed all three.

My question here is; do you want to make a standard for all TextMate implementations, or is it okay if we take advantage of Atom's implementation?

@jeff-hykin
Copy link

jeff-hykin commented Aug 8, 2020

To ensure this standard never has the issue of the original Textmate standard I think it should be worded as if/case statements.

For example:

if the token cannot be used as a variable name
    then
         it should have `keyword`
    and if all characters in the token are non-alphanumeric
        then
            it should have `keyword.symbol`
[etc]

else (if none of these cases fit)
     use the `other`

Even this small example exposes the ambiguity of punctuation not being able to be used as variable names, and calls into question the difference between punctuation and keywords. There is a well defined difference, but the current standard doesn't make the difference clear. (I'll address this more when I discuss specific criticisms).

@jeff-hykin
Copy link

Something to consider (I haven't seen it mentioned yet in any of the PR's), is releasing all the language and theme changes as a bundled extension to get feedback. Then after some time and feedback it can be made the default, and all the old language-scopes and themes can be bundled into a "legacy scoping" extension for those who really want the old style.

It's just pretty awful when crunching to meet deadlines, and then an auto-update suddenly makes you feel blind because the colors have all changed. While on the other hand, people looking for an extension like this will be elated when they find something that does a better job at highlighting across the board.

@chbk
Copy link
Author

chbk commented Aug 10, 2020

My question here is; do you want to make a standard for all TextMate implementations, or is it okay if we take advantage of Atom's implementation?

I'll clarify the context of this PR and hopefully that will answer your question.

This documentation proposal is based on the Textmate naming conventions because that's what themes and grammars use, for lack of a suitable alternative. Atom defines itself as a hackable editor, yet creating a theme/grammar that's interoperable with other themes/grammars is too laborious: the Textmate scopes are insufficient and there is no Atom documentation.

Recently, Atom adopted Tree-sitter to parse source code. It works great and surpasses regex parsers in many ways: it's smarter, faster, robuster. For end users to truly appreciate this change though, we need to bridge the gap between the parser and the interface, with extensive naming conventions for grammar and theme developers.

This is where this PR comes in. Its goal is to correct and extend the Textmate conventions to make something practical for Atom. Once a clear set of conventions is endorsed and implemented, other editors that want to switch to Tree-sitter can fork the Atom grammars and benefit from them. In the long run, it's a win for everyone.



For example consider the 'variadic sizeof' in C++ which is sizeof.... It's an operator, but it is not alphanumeric and it's also not symbolic. It's kind of both.

Taking specific examples to see how they fit in is a good way to validate the documentation. The symbol scope is for non-alphanumeric keywords, so it doesn't apply to sizeof... as a whole. However, sizeof... is the combination of two operators: sizeof and .... It can be split into separate tokens. The Tree-sitter grammar does exactly that, so each token gets its own scopes, respectively keyword.operator and keyword.operator.symbol.



Similarly is a lambda a function definition or a variable? It's definitely both.

Scopes that are compatible with others are indicated in brackets in the documentation proposal. entity.function and entity.variable aren't considered compatible. The distinction between a function and a variable has to be clear for themes to correctly highlight them. Similarly, you could say a string literal is also a constant, but again the distinction is clear in the conventions to resolve the highlighting ambiguities that would follow.



To ensure this standard never has the issue of the original Textmate standard I think it should be worded as if/case statements.

I'll argue that the conventions can already be read as a series of conditions: if a definition fits your token, apply the scope. There are several benefits to presenting them as a list: it avoids complicated nesting of if/else statements, authorized overlaps are explicitly indicated in brackets, it's easier to skim through to find a fitting scope, and the main scopes can be highlighted to stand out.



Something to consider (I haven't seen it mentioned yet in any of the PR's), is releasing all the language and theme changes as a bundled extension to get feedback.

Good point, although bundling all the themes and grammars in a single package might not be feasible. I'll add some instructions in the main post for those who want to preview the PRs.

@jeff-hykin
Copy link

jeff-hykin commented Aug 11, 2020

Browsing themes yesterday, I found Styri and thought I recognized the profile picture. Nice job I like it 😁 👍

So after reading your post, I'm thinking the answer is yes (answer to "can we take advantage of Atom's [e.g. TreeSitter] implementation"). Is that okay / a correct assumption? Because of this you can completely forget about my comment about an if-statement hierarchy. That was only under the assumption that TextMate scopes were going to be used.

You'll be happy to hear I'm very familiar with the Tree Sitter and TextMate! I was actually the first to get the Tree sitter working cross platform in a VS Code extension (using Max Brunsfield's WASM build). That extension works terribly btw, but Max's work is amazing.





although bundling all the themes and grammars in a single package might not be feasible

Just create an empty extension that lists all the language & theme extensions as dependencies! 👍 (assuming you've published the themes/languages as individual extensions) I learned that trick last week. I'm happy to help out in doing that. I just published a language-javascript extension last week (because the this keyword didn't have any scopes, and JSX was missing several scopes).





  1. entity.function and entity.variable aren't considered compatible. The distinction between a function and a variable has to be clear for themes to correctly highlight them.
  1. if a definition fits your token, apply the scope.

These seem to conflict with each other, BUT hear me out.

Say I want all my variables bold and I want all my functions blue.

int i_should_be_bold = 10;
void i_should_be_blue() {};
auto i_should_be_bold_AND_blue = [ ]() -> int { return 10; };

I have an Atom theme that styles them correctly AND does not make a mutually-exclusive distinction. In fact I can ONLY style the third case correctly because they are allowed to overlap. I fully agree with "if a definition fits your token, apply the scope" 👍 and I think that statement should be the foundation for the entire standard. The statement never causes a conflict with itself, it is easy to remember, and it can be applied to any language concept including concepts that have yet to be invented! Following that logic, since a lambda meets the definition of both a variable and a function, both scopes are applied.

Not to say that any hierarchy is bad, but any strictly-hierarchical implementation is going to be fundamentally flawed just like TextMate scopes. The Tree Sitter deserves a better standard.




I think you and I are on the same page in that: the standard should define what happens when there is a conflict (such as variables=red, functions=blue). In that case, I think we can both agree the .function class would take priority and we should document this so that theme maintainers don't have to worry about it.




Sizeof...

However, sizeof... is the combination of two operators: sizeof and ...

Haha you'd think that, but actually the specification defines it as a single operator

"Variadic templates use the sizeof...() operator (unrelated to the older sizeof() operator):"

See here and here

(I maintain the C++ Textmate grammar for VS Code so I've spent an ungodly amount of time on these things). Not only is the sizeof...() a single operator, but the parentheses themselves are also part of the operator! (e.g. sizeof...(Ts) works but sizeof...((Ts)) does not and sizeof... Ts also does not, you can test it online in that first link) Even more strangely, despite it still being parsed as a single token/operator, gcc does allow for spaces between the size and ....

While there is no ... operator, there is however ... punctuation in C++ called parameter packing there is also (a separate form of ... punctuation) for variadic arguments.

C++ is quite convoluted haha. The tree sitter is just making a benign mistake.




String literals

Similarly, you could say a string literal is also a constant

Ruby has both immutable and mutable strings so the string itself isn't really inherently constant, even Ruby's number literals can be mutable! Javascript can actually redefine undefined as a function parameter so its not constant either haha (but that isn't true for null in Javascript). So I'd actually say that both numbers and strings are .literal and neither are inherently constant. Same with date literals (yaml), file-path literals (nix lang), symbols/atoms (ruby/elixir), regex, and many other edgecases that people don't think about.

Once upon a time, I thought loop (in Ruby) was colored incorrectly. It was being highlighted as a built-in function when it was a keyword. ... Nope; turns out loop is indeed just a function, despite feeling like a keyword but I only learned that because of good syntax highlighting and theming.

Calling all literals constant actually can detract from people's understanding of what they are reading.




In a similar vein, regex has true conditional statements which should be scoped as control flow. However, regex anchors do not control flow (and I believe they shouldn't be scoped as control flow).

To make it easier on themes, the tools available in less (the less language) can be used to create fallbacks. If no style is given for regex anchors, then they can fallback on using the style given to the control flow scope. But that doesn't mean regex anchors should be scoped with an obviously incorrect scope. Just imagine if HTML tags used variable because most languages didn't have the concept of a "tag".




I really believe we can make a standard where there are no major/true "well... you could say"'s. Not only possible, but easy thanks to CSS and the tree sitter. Do you think it is possible?

@chbk
Copy link
Author

chbk commented Aug 12, 2020

So after reading your post, I'm thinking the answer is yes (answer to "can we take advantage of Atom's [e.g. TreeSitter] implementation"). Is that okay / a correct assumption?

Yes I don't see why not let others use it.



I was actually the first to get the Tree sitter working cross platform in a VS Code extension (using Max Brunsfield's WASM build).

That's great! I think VS Code would benefit a lot from Tree-sitter as well.



Just create an empty extension that lists all the language & theme extensions as dependencies!

Nice! I'll try that out.



I really believe we can make a standard where there are no major/true "well... you could say"'s. Not only possible, but easy thanks to CSS and the tree sitter. Do you think it is possible?

It's something to aim for. The conventions should settle the ambiguities so developers don't struggle with them. There's a consensus to reach between languages so developers can achieve consistent highlighting within and across them.



Since a lambda meets the definition of both a variable and a function, both scopes are applied.

I disagree with this. I'd rather add a lambda scope to entity.function. It would be more explicit and enable the kind of customization you want. Allowing a token to be scoped both entity.function and entity.variable will just cause confusion, especially if we have to specify in what order they should appear in a CSS stylesheet for a theme to work correctly.



Not only is the sizeof...() a single operator, but the parentheses themselves are also part of the operator!

You can write sizeof ... () with whitespaces in the middle, so I believe the Tree-sitter grammar made the right choice by splitting it into separate tokens.



While there is no ... operator, there is however ... punctuation in C++

Thanks for pointing that out, I'll change the scope to punctuation.



So I'd actually say that both numbers and strings are .literal and neither are inherently constant. Same with date literals (yaml), file-path literals (nix lang), symbols/atoms (ruby/elixir), regex, and many other edgecases that people don't think about.

I agree with you, having a literal scope would make sense. However, in the Textmate documentation, constant is the conventional scope for numbers, characters, and primitive values so I kept it. Adding the mutable and immutable scopes fills the gap without needing to rename and disrupt constant.



In a similar vein, regex has true conditional statements which should be scoped as control flow. However, regex anchors do not control flow (and I believe they shouldn't be scoped as control flow).

I used this as a reference for the documentation but I'm open to suggestions for better scopes.

@jeff-hykin
Copy link

jeff-hykin commented Aug 14, 2020

I think .lambda is a perfectly valid solution as well! It's certainly much closer to how people actually talk about it. I do want to point out though, there are still going to be a lot of weird "it meets the definition" edge cases. For example

const auto a_var = []() mutable {};

That is a global constant mutable lambda haha. In which case I still think it deserves both the constant and lambda tags.

I'm fine with things like meta, that don't really make sense but can be reused from convention. And For numbers and strings I would be fine with .constant, except that constant already has a meaning (like in my example above). π, e, MAX_INT_SIZE are regarded constants by both programmers and mathematicians, and essentially nobody ever refers to a literal as "that constant". What if you (a theme maker) want to tag all literals but not constants? You'd have to do constant:not(every-single-possible-scope-for-a-literal)

While there could be an additional ".literal" so that :not(.literal) works, the theme maintainer isn't going to know that immediately. And if they learn it from trial-and-error (very bad, very discouraging) they're going to think "okay literals were an exception, what else is going to be an exception? classes? containers? keywords?" They're going to have to learn the entire naming scheme before they can style something with confidence.

I used this as a reference for the documentation

Okay I'll take a look at it and see what I can come up with 👍

Before I plan specific suggestions though, I'm curious about the final verdict on "if it meets the definition, add it".

Specifically the code example above where mutable describes the function (impure function) and constant describes the variable (cannot be reassigned). I think there could be a simple targeted adjective system, but I'm not sure if you're interested in that.

@chbk
Copy link
Author

chbk commented Aug 14, 2020

except that constant already has a meaning

Different languages call similar things in different ways, there's a lot of diversity. The conventions just reach a consensus: tokens that are "constant", "final", "read-only", etc. are scoped immutable.



I'm curious about the final verdict on "if it meets the definition, add it".

The conventions are a tree, so simply pick the branch whose definition fits your token. Overlaps are explicitly stated in brackets, they can be read as extensions of the branch you pick.

@jeff-hykin
Copy link

jeff-hykin commented Aug 15, 2020

Immutable, although somewhat unusual, works for me in terms of consistency (kind of picking one of term in self VS this, or property VS attribute, or method VS member-function) How would you scope that lambda though?

​const auto a_var = []() mutable {};

.immutable.lambda.mutable?

I absolutely agree with the diversity, but I see additional diversity as making it more difficult for an effective hierarchy to exist.

@chbk
Copy link
Author

chbk commented Aug 16, 2020

I wonder whether scoping a function as immutable is relevant, the const keyword doesn't seem to have any effect here. Also, the mutable keyword denotes that the function can modify immutable values, not that the function is modifiable itself. So in all simplicity, a_var should be entity.function.lambda. If you specifically want to differentiate lambdas that modify immutable values, that's outside the range of this documentation, but you are free to add an additional non-conflicting scope in your grammar, perhaps impure would be a fitting choice.

@jeff-hykin
Copy link

jeff-hykin commented Aug 16, 2020

that's outside the range of this documentation

Understandable, not every edgecase should be directly covered in the standard. My intent was not solving this specific case but rather using it as an example of many situations with conflicting definitions. We can't really judge if it is "relevant" because it's not only for us. Mutability is very important in Elixir, and Rust. Things like Rusts "unsafe" could be the defining line for an banking system. Just blanket saying "___ doesn't matter" is a pretty bold statement.

Language servers (not just the tree sitter) will also be using this same scoping standard, and they're going to have a lot more scopes describing variables and functions (async, safe/unsafe, pure/impure, thread-safe, signal-safe, exceptionless, capturing/non-capturing, const, constexpr, member/non-member, friend, public/private/protected, etc).

Back on topic though: for another example of overlapping definitions, consider Haskell where the operators themselves are entities that are passed around as values, yet they are also still operators. Or consider the opposite case in Haskell, where functions can be used as infix operators when inside backticks: 1 `plus` 2

Definitions like

"all characters in the token are alphanumeric"

And

"At least one character in the token is not alphanumeric"

are a partition.

Same with

  1. "all characters in the token are alphanumeric"
  2. "all characters in the token are not alphanumeric"
  3. "Everything else"

My main point is; the current hierarchy isn't a partition. There are gray areas exactly like the existing Textmate guidelines. Once grammar makers/language-servers reach those gray areas, they're going to make up their own solution. Two people make their own solution and boom: competing naming conventions exist (no standard), themers will have to memorize them, grammar maintainers have to debate whose is better; and we are right back at square 1. The design of the standard should be capable of dealing with concepts for languages that haven't even been invented yet.

TLDR; Lambda is a fix for the specific case where the definitions (of variable and function) overlap, but what about all the other possible overlaps?

Style Bleeding

A second (more practical) issue is that the CSS classes don't support a hierarchy. For example styling .storage.keyword will accidentally also apply that same style to .keyword.operator.storage (an operator like "sizeof" that assesses the amount of storage).
Something such as .entity.type.storage (a definition of a memory layout on an embedded system) would accidentally get all the same styles as .storage.type.entity, because in CSS order doesn't really matter.

To have the scopes function as an actual hierarchy and not "bleed" style into other parts of the heirarchy, the scopes for something storage keyword [declaration] variable would have to be like

.storage
.storage--keyword
.storage--declaration
.storage--keyword--variable

Which is a viable solution

Non-conflicting language specific scopes could also be created such as

.storage
.storage--keyword
.storage--declaration
.cpp--auto

or

.lambda
.cpp--mutable
.cpp--function

@chbk
Copy link
Author

chbk commented Aug 22, 2020

It sounds like you would like to scrap the current hierarchy for a flat list of scopes instead. I've given it some thought but ultimately it's best to keep this hierarchy, for a couple reasons. First, it guarantees that scopes from one category won't conflict with scopes from other categories. You can't scope something entity.type.storage because that conflicts with storage which is a higher-order scope. But you can scope something punctuation.definition.function and it won't conflict with entity.function which is another category. The predefined structure helps themes target exactly what they intend by categorizing their highlighting rules, instead of having to foresee every combination of scopes. Secondly, the Textmate conventions are already prevalent in so many themes and grammars. The purpose of this PR shouldn't be to introduce a radical reform, but to solve the issues with the existing conventions for something applicable right away. I think that maximizes its chance to be accepted. The related grammar PRs and theme PRs show that the hierarchy works in a practical way. I believe this can become an effective standard.

That said, I welcome your insights on ambiguities that aren't covered by this PR yet. Specifically for your example in Haskell, the plus in 1 `plus` 2 can be scoped entity.function(.infix) which fits with the PR as it is. However, there is no evident scope for a user-defined operator such as =+= in 1 =+= 2. keyword.operator would obviously be misleading and entity.function would cause some irregular highlighting. So I added the entity.operator scope which is admittedly missing.

@jeff-hykin
Copy link

Sorry, I don't mean to be so pessimistic. So far you've done a great job addressing the issues I bring up. Lots of your solutions are good answers I hadn't considered (which might be the case with the last two issues I brought up too).

I'm bringing these up because breaking backwards compatibility is going to be painful. If the change doesn't fix everything now, I people are going to be really against doing a second breaking change. If the fix doesn't clearly address all the issues, I think people will be less on board about the changes.

While I think a flat system would solve the problem, it would basically be a full re-write which isn't great. I'm open to a partitioned heirarchy. However, whichever system is picked, I don't think gray areas should be allowed.

Okay back to the details

You can't scope something entity.type.storage because that conflicts with storage which is a higher-order scope. But you can scope something punctuation.definition.function and it won't conflict with entity.function which is another category. The predefined structure helps themes target exactly what they intend by categorizing their highlighting rules, instead of having to foresee every combination of scopes.

So you're saying the standard ensures that names don't end up in multiple places of the heirarchy? (Which would prevent the style "bleeding" problem)

I'm a bit confused with the entity.operator though, wouldn't that cause an overlap with keyword.operator? (I think there might be a point you're making that I'm not understanding).

@jeff-hykin
Copy link

jeff-hykin commented Aug 24, 2020

But you can scope something punctuation.definition.function and it won't conflict with entity.function which is another category

Note: If someone styles .function it will end up conflicting by styling both punctuation.definition.function and entity.function

@chbk
Copy link
Author

chbk commented Aug 25, 2020

Yeah ensuring backward compatibility is tough, especially when there isn't a definite standard to work with besides the scarce Textmate conventions. I'm glad you're passionate about this and I value your input and suggestions for different approaches. I think the best way to understand the hierarchy is to check out the updated template theme stylesheet. It shows how to structure a theme by separating scopes into categories.

The idea is that having a rule just for operator or just for function is too general, so you put them into categories that clarify your intent, like keyword or entity. It's already the case with most themes, you'd see a keyword.operator rule or entity.function rule. But the scopes weren't clearly organized in the Textmate documentation so I brought that all together in this PR. Another benefit is that categories also act as fallbacks. Having an entity styling rule provides a default for other identifiers.

@jeff-hykin
Copy link

jeff-hykin commented Feb 12, 2021

Thanks for the screenshots (and sorry I missed them in that other PR)

If we can add a prefix to the scopes, this PR has my full support. And I'm happy to help add them, you've put in a ton of work already. Would you be okay with them being added?

For example, lets say the prefix is std- then the old

   '"if"': 'keyword.control' 

Becomes (and note: the std-. isn't a typo)

   '"if"': 'keyword.control std-.std-keyword.std-control.std-condition' 

Instead of

   '"if"': 'keyword.control.condition'

Its not overly beautiful, but there are 5 really strong benefits

  1. The std-. at the start gives it the highest priorty, meaning Sublime/VS Code and other systems will be able to use this standard and be backwards compatible (I can add this to VS Code's C++ and Go syntax right now to help propogate your new standard heirarchy)
  2. The std- in front of every name prevents any backwards compatability issues in Atom. Since every scope has a CSS name, they each need a differentiator.
  3. The std- makes it really clear to someone who is reading a grammar or forking a theme, that there is a standard system for these names
  4. Having a prefix allows for there to be a clear transition; we can see if a theme/grammar is using the new standard, or if they're not. We can also see in std- is being misused (the old methods were misused all the time).
  5. If a language has a concept that only exists in that language, the adverbs of J lang, or a go-routine, then they can follow the prefix pattern by having their own language specific-prefix without misusing tags that are designed to be cross-language compatible.

If/when this PR is merged, I'd like to open a git repo (you @chbk as owner) with the hierarchy in the Readme as an official standard. That was proposed additions to it can be done as PR, and we can refer other people to it. I'll link to it in the language grammars I maintain.

sadick254 pushed a commit to atom/atom that referenced this pull request Feb 19, 2021
Update of the default syntax themes to implement [naming conventions](atom/flight-manual.atom.io#564) for syntax scopes.

Adds the [template](atom/apm#883) to each theme with custom colors, to accommodate the naming conventions. There should be no compatibility break with existing grammars.

As naming conventions are implemented in more language grammars, their old specific stylesheets can be retired.
@chbk
Copy link
Author

chbk commented Feb 19, 2021

Nice, it's great if you can get these scopes into other grammar packages. I'm happy with anything that brings us a usable standard. Your prefix idea is feasible, however, the scopes documented in this PR really aren't that radically different from the TextMate scopes. You'll end up with a lot of std- repetitions of already present scopes. Also, if the intent was to write a new standard from scratch and use it alongside the TextMate scopes, I would have taken a different approach (for example I would have renamed and reorganized the constant scope, among other things). In short I see this PR as an extension to TextMate scopes rather than a completely new standard.

BTW the majority of scopes here aren't mandatory, you can have a very functional grammar/theme using only the bold scopes. The other scopes are here to provide plenty of examples and can be used if needed. So another possibility would be to just add some of the bold scopes to grammar packages, and then later complete them and weed out the old irrelevant scopes. This would help themes that rely on the obsolete name scope and the variable scope (some themes assume it's only used for noteworthy variables, even though there is no such restriction in the original TextMate documentation).

If you want to open a repo for this, go for it 👍
Also be sure to check out the Markdown file that's included in this PR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants